边缘计算是加速机器学习算法支持移动设备的流行目标,而无需通信潜伏在云中处理它们。机器学习的边缘部署主要考虑传统问题,例如其安装的交换约束(尺寸,重量和功率)。但是,考虑到体现能量和碳的重要贡献,这种指标不足以考虑计算的环境影响。在本文中,我们探讨了用于推理和在线培训的卷积神经网络加速引擎的权衡。特别是,我们探讨了内存处理(PIM)方法,移动GPU加速器以及最近发布的FPGA的使用,并将它们与新颖的赛车记忆PIM进行比较。用赛车记忆PIM替换支持PIM的DDR3可以恢复其体现的能量,以至于1年。对于高活动比,与支持PIM的赛车记忆相比,移动GPU可以更可持续,但具有更高的体现能量可以克服。
translated by 谷歌翻译
预审前的语言模型在自然语言处理的各个领域都取得了成功,包括阅读理解任务。但是,当将机器学习方法应用于新域时,标记的数据可能并不总是可用。为了解决这个问题,我们使用对源域数据进行预处理的监督,以降低特定于域的下游任务的样本复杂性。我们通过将任务转移与域适应性相结合以微调验证的模型,而没有目标任务中的数据来评估特定于领域的阅读理解任务的零射击性能。我们的方法在4个域中的3个域中的下游域特异性阅读理解任务上超过了域自适应预测。
translated by 谷歌翻译
语言模型既展示了定量的改进,又展示了新的定性功能,随着规模的增加。尽管它们具有潜在的变革性影响,但这些新能力的特征却很差。为了为未来的研究提供信息,为破坏性的新模型能力做准备,并改善社会有害的效果,至关重要的是,我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战,我们介绍了超越模仿游戏基准(Big Bench)。 Big Bench目前由204个任务组成,由132家机构的442位作者贡献。任务主题是多样的,从语言学,儿童发展,数学,常识性推理,生物学,物理学,社会偏见,软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号,Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为,跨越了数百万到数十亿个参数。此外,一个人类专家评估者团队执行了所有任务,以提供强大的基准。研究结果包括:模型性能和校准都随规模改善,但绝对的术语(以及与评估者的性能相比);在模型类中的性能非常相似,尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分,而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标;社交偏见通常会随着含糊不清的环境而随着规模而增加,但这可以通过提示来改善。
translated by 谷歌翻译
元学习传统上,传统上依赖于整个任务来迭代改善模型的学习动态。但是,当缩放到复杂任务时,这种方法是在计算上难以解决的。我们使用张量处理单元(TPU)提出了一种分布式进化元学习策略,该张量处理单元(TPU)非常平行,可扩展到任意长的任务,内存成本没有增加。使用在Omniglot DataSet上进行的原型网络培训,我们在5次分类问题上实现了98.4%的准确性。我们的算法使用的存储器多达40倍,而不是自动差异计算梯度,结果模型可实现高精度培训的等效物(99.6%)的1.3%内的精度。我们观察到更高的分类准确性高达99.1%,人口配置较大。我们进一步通过实验验证了跨各种培训条件的ES-Protonet的稳定性和性能(不同的人口大小,模型规模,工人数量,射击,方式,es upperameters等)。我们的贡献是双重的:我们在监督环境中提供了对进化元学习的第一次评估,并为TPU的分布式演进策略创建了一般框架。
translated by 谷歌翻译
生成模型的面部匿名化已经变得越来越普遍,因为它们通过生成虚拟面部图像来消毒私人信息,从而确保隐私和图像实用程序。在删除或保护原始身份后,通常无法识别此类虚拟面部图像。在本文中,我们将生成可识别的虚拟面部图像的问题形式化和解决。我们的虚拟脸部图像在视觉上与原始图像不同,以保护隐私保护。此外,它们具有新的虚拟身份,可直接用于面部识别。我们建议可识别的虚拟面部发电机(IVFG)生成虚拟面部图像。 IVFG根据用户特定的键将原始面部图像的潜在矢量投射到虚拟图像中,该键基于该图像生成虚拟面部图像。为了使虚拟面部图像可识别,我们提出了一个多任务学习目标以及一个三联生的培训策略,以学习IVFG。我们使用不同面部图像数据集上的不同面部识别器评估虚拟面部图像的性能,所有这些都证明了IVFG在生成可识别的虚拟面部图像中的有效性。
translated by 谷歌翻译
A further understanding of cause and effect within observational data is critical across many domains, such as economics, health care, public policy, web mining, online advertising, and marketing campaigns. Although significant advances have been made to overcome the challenges in causal effect estimation with observational data, such as missing counterfactual outcomes and selection bias between treatment and control groups, the existing methods mainly focus on source-specific and stationary observational data. Such learning strategies assume that all observational data are already available during the training phase and from only one source. This practical concern of accessibility is ubiquitous in various academic and industrial applications. That's what it boiled down to: in the era of big data, we face new challenges in causal inference with observational data, i.e., the extensibility for incrementally available observational data, the adaptability for extra domain adaptation problem except for the imbalance between treatment and control groups, and the accessibility for an enormous amount of data. In this position paper, we formally define the problem of continual treatment effect estimation, describe its research challenges, and then present possible solutions to this problem. Moreover, we will discuss future research directions on this topic.
translated by 谷歌翻译
The growing interest in intelligent services and privacy protection for mobile devices has given rise to the widespread application of federated learning in Multi-access Edge Computing (MEC). Diverse user behaviors call for personalized services with heterogeneous Machine Learning (ML) models on different devices. Federated Multi-task Learning (FMTL) is proposed to train related but personalized ML models for different devices, whereas previous works suffer from excessive communication overhead during training and neglect the model heterogeneity among devices in MEC. Introducing knowledge distillation into FMTL can simultaneously enable efficient communication and model heterogeneity among clients, whereas existing methods rely on a public dataset, which is impractical in reality. To tackle this dilemma, Federated MultI-task Distillation for Multi-access Edge CompuTing (FedICT) is proposed. FedICT direct local-global knowledge aloof during bi-directional distillation processes between clients and the server, aiming to enable multi-task clients while alleviating client drift derived from divergent optimization directions of client-side local models. Specifically, FedICT includes Federated Prior Knowledge Distillation (FPKD) and Local Knowledge Adjustment (LKA). FPKD is proposed to reinforce the clients' fitting of local data by introducing prior knowledge of local data distributions. Moreover, LKA is proposed to correct the distillation loss of the server, making the transferred local knowledge better match the generalized representation. Experiments on three datasets show that FedICT significantly outperforms all compared benchmarks in various data heterogeneous and model architecture settings, achieving improved accuracy with less than 1.2% training communication overhead compared with FedAvg and no more than 75% training communication round compared with FedGKT.
translated by 谷歌翻译
We derive a set of causal deep neural networks whose architectures are a consequence of tensor (multilinear) factor analysis. Forward causal questions are addressed with a neural network architecture composed of causal capsules and a tensor transformer. The former estimate a set of latent variables that represent the causal factors, and the latter governs their interaction. Causal capsules and tensor transformers may be implemented using shallow autoencoders, but for a scalable architecture we employ block algebra and derive a deep neural network composed of a hierarchy of autoencoders. An interleaved kernel hierarchy preprocesses the data resulting in a hierarchy of kernel tensor factor models. Inverse causal questions are addressed with a neural network that implements multilinear projection and estimates the causes of effects. As an alternative to aggressive bottleneck dimension reduction or regularized regression that may camouflage an inherently underdetermined inverse problem, we prescribe modeling different aspects of the mechanism of data formation with piecewise tensor models whose multilinear projections are well-defined and produce multiple candidate solutions. Our forward and inverse neural network architectures are suitable for asynchronous parallel computation.
translated by 谷歌翻译
Reinforcement learning (RL) is one of the most important branches of AI. Due to its capacity for self-adaption and decision-making in dynamic environments, reinforcement learning has been widely applied in multiple areas, such as healthcare, data markets, autonomous driving, and robotics. However, some of these applications and systems have been shown to be vulnerable to security or privacy attacks, resulting in unreliable or unstable services. A large number of studies have focused on these security and privacy problems in reinforcement learning. However, few surveys have provided a systematic review and comparison of existing problems and state-of-the-art solutions to keep up with the pace of emerging threats. Accordingly, we herein present such a comprehensive review to explain and summarize the challenges associated with security and privacy in reinforcement learning from a new perspective, namely that of the Markov Decision Process (MDP). In this survey, we first introduce the key concepts related to this area. Next, we cover the security and privacy issues linked to the state, action, environment, and reward function of the MDP process, respectively. We further highlight the special characteristics of security and privacy methodologies related to reinforcement learning. Finally, we discuss the possible future research directions within this area.
translated by 谷歌翻译
The existing methods for video anomaly detection mostly utilize videos containing identifiable facial and appearance-based features. The use of videos with identifiable faces raises privacy concerns, especially when used in a hospital or community-based setting. Appearance-based features can also be sensitive to pixel-based noise, straining the anomaly detection methods to model the changes in the background and making it difficult to focus on the actions of humans in the foreground. Structural information in the form of skeletons describing the human motion in the videos is privacy-protecting and can overcome some of the problems posed by appearance-based features. In this paper, we present a survey of privacy-protecting deep learning anomaly detection methods using skeletons extracted from videos. We present a novel taxonomy of algorithms based on the various learning approaches. We conclude that skeleton-based approaches for anomaly detection can be a plausible privacy-protecting alternative for video anomaly detection. Lastly, we identify major open research questions and provide guidelines to address them.
translated by 谷歌翻译